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The Odd Cycle Transversal problem (OCT) asks whether a given graph can be 
made bipartite by deleting at most k of its vertices. In a breakthrough result Reed, Smith, 

i^ ' and Vetta (Operations Research Letters, 2004) gave a 0{'i^kmn) time algorithm for it, the 

first algorithm with polynomial runtime of uniform degree for every fixed k. It is known 
O ' that this implies a polynomial-time compression algorithm that turns OCT instances into 

^^ , equivalent instances of size at most 0(4*^), a so-called kernelization. Since then the existence 

of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in k, 
has turned into one of the main open questions in the study of kernelization. Despite the 
impressive progress in the area, including the recent development of lower bound techniques 
(Bodlaender et al., ICALP 2008; Fortnow and Santhanam, STOC 2008) and meta-results 
on kernelizations for graph problems on planar and other sparse graph classes (Bodlaender 
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C/3 , et al., FOCS 2009; Fomin et al., SODA 2010), the existence of a polynomial kernel for OCT 

^, ' has remained open, even when the input is restricted to be planar. 

This work provides the first (randomized) polynomial kernelization for OCT. We intro- 
V^J \ duce a novel kernelization approach based on matroid theory, where we encode all relevant 

QQ ' information about a problem instance into a matroid with a representation of size polyno- 

^^ . mial in k. For OCT, the matroid is built to allow us to simulate the computation of the 

I~ \ iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only 

• ' one round) to an approximate odd cycle transversal which it is aiming to shrink to size k. 

f— ^ ■ The process is randomized with one-sided error exponentially small in k, where the result 

can contain false positives but no false negatives, and the size guarantee is cubic in the size 
of the approximate solution. Combined with an 0{\/\og n)-approximation (Agarwal et al., 
STOC 2005), we get a reduction of the instance to size 0{k'^'^), implying a randomized 
^^ , polynomial kernelization. Interestingly, the known lower bound techniques can be seen to 

Jh \ exclude randomized kernels that produce no false negatives, as in fact they exclude even 

co-nondeterministic kernels (Dell and van Melkebeek, STOC 2010). Therefore, our result 
also implies that deterministic kernels for OCT cannot be excluded by the known machinery. 

1 Introduction 

One of the most successful (and natural) applications of parameterized complexity is the study 
of combinatorially hard problems for the case that one seeks a small solution. Such a problem 
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is fixed-parameter tractable (FPT) if it can be checked in time f{k)iiP whether an instance of 
size n has a solution of size at most (or at least) k. When k is not too large, such an algorithm 
can be considered efficient. This can be especially important for minimization problems where 
the solution size corresponds to a real-world cost. 

Curiously, for any decidable problem, having an FPT algorithm is known to coincide with 
having a polynomial-time data reduction algorithm that reduces any instance to an equivalent 
instance with size bounded by some function of k, a so-called kernelizatiorv^ However, the 
kernel size bound implied is the same function f{k) as occurs in the running time bound, which 
for non-trivial parameters will almost certainly be exponential in k unless P = NP. 

A more useful notion of efficient data reduction is polynomial kernels, i.e., kernelizations with 
kernel size bounded polynomially in the parameter. For many problems, this can be achieved 
by a direct study of kernelization, e.g., the classic reduction of Vertex Cover to 2k ver- 
tices [501 [16], or the recent reduction of Feedback Vertex Set to size 0{k'^) hy Thomasse [57] , 
improving on work by Burrage et al. [13j and Bodlaender [7j. Having small (polynomial) ker- 
nels provides a formalization of efficient data reduction, and additionally, producing them often 
requires significant insight into the combinatorial structure of a problem. 

Accordingly, the search for more and better kernelizations has evolved into a main branch 
of parameterized complexity (in fact, the opinion has been raised that kernelization is what 
fixed-parameter tractability is really about [22]). In particular, the existence of a polynomial 
kernelization for a problem is seen as a significant threshold, comparable to the existence of an 
FPT algorithm in the first place. Recent seminal work of Bodlaender et al. [8J and Fortnow and 
Santhanam [27] enforced the importance of this threshold by providing techniques to show that 
certain problems do not admit polynomial kernels unless NP C coNP/poly and the polynomial 
hierarchy collapses to its third level (see also Harnik and Naor [35j for a related question). 
Furthermore, a paper by Dell and van Melkebeek [TH] was the first work to provide lower 
bounds for the degree of a polynomial kernelization; among other things, their work implies 
an 0{k'^) lower bound for Feedback Vertex Set and Odd Cycle Transversal. Another 
recent focus has been m,eta kernelizations^ i.e., meta- level results that provide kernelizations for 
a large range of problems, under restrictions on the input [9l [26] ; see below. 

Still, for all this work, some problems have so far resisted classification with respect to exis- 
tence of polynomial kernels. Among these, emerging as the most important and most frequently 
raised questions - e.g., the two problems singled out as having the highest importance at the 
recent workshop on kernelization, WorKer 2010 - is the existence of polynomial kernelizations 
for the problems Odd Cycle Transversal (OCT) and Directed Feedback Vertex Set 
(DFVS). Both problems are also open even in the restricted case of planar graphs [lOj . In this 
paper, we focus on OCT, where the question was first raised in [33]; see also the recent survey 
on lower bounds for kernelization by Misra et al. |48j. 

The Odd Cycle Transversal problem. The Odd Cycle Transversal problem asks whether 
a given graph G can be made bipartite by deleting at most k of its vertices. Together with 
natural variants such as Edge Bipartization, the edge deletion version, and Balanced Sub- 
graph, the problem of removing odd-parity cycles in signed graphs, this problem has numerous 
applications (see, e.g., [571 [38]), a-^d has received significant research attention [551 [331 [Ml [311 
[571 H^ [58l I40j . With respect to parameterized and exact computation the breakthrough re- 
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for any decidable problem, can be easily verified, although here the bound gets worse. 



suit was the 0{^^kmn) time algorithm by Reed, Smith, and Vetta [SS]q This was the first 
occurrence of the technique now called iterative compression. (Note that this is entirely dis- 
tinct from the notion of instance compression, which has been used as a term to generalize 
kernelization; see related work below.) This technique also led to the first FPT algorithm for 
Directed Feedback Vertex Set by Chen et al. [18] and Almost 2-SAT by Razgon and 
O'Sullivan [53], and has become an important tool in parameterized complexity; see the survey 
by Guo et al. [M]- For Edge Bipartization, an 0{2^m'^) time algorithm exists due to Guo 
et al. [33]. The best known approximation result for OCT is 0{-\/\ogn) due to Agarwal et 
al. [1], improving on earlier results with a ratio of 0{\ogn) p9j. For Edge Bipartization 
there is also an 0(logOPT)-approximation by Avidor and Langberg [4J. Under the unique 
games conjecture, neither problem can have a constant-factor approximation (41j . 

What makes OCT special? The OCT problem belongs to the class of graph modification 
problems, i.e., finding a minimum number of modifications of vertices and/or edges in a given 
graph to achieve a given property (like bipartiteness). If the target property is sparse (e.g., 
being a forest for Feedback Vertex Set), or if it can be defined by a constant number of 
forbidden structures (e.g., induced paths on three vertices for Cluster Editing), then many 
such problems are known to have a polynomial kernel, although the kernelization is by no means 
always easy. On the other hand, only few polynomial kernels are known for target properties 
that do not have either characteristic. One candidate is deleting at most k arcs to remove all 
directed cycles from a tournamenio, but here it suffices to consider the directed triangles [5]. 
Also, Chord AL Completion has a polynomial kernel [l9], but here, a large obstacle is helpful 
in that a chordless cycle of length t requires at least t — 2 edges to be added. No such useful 
exception exists for OCT. 

Additionally, there are involved meta-results for graph problems restricted to sparse inputs 
like planar, bounded genus, and //-minor free graphs [9l[26]. However, it can be easily seen 
that OCT is neither compact nor quasi-compact ([9J), as YES-instances do not have bounded 
treewidth (take an arbitrarily large grid graph and add a few edges that cause odd cycles); 
similarly it is not minor hidimensional ([2B]) as the cost on a grid is zero. Thus, even for planar 
inputs, it is not covered by any known meta-result. Generally, similar statements as the above 
can be made about DFVS. 

Our work. In this paper, we give a randomized polynomial kernelization for OCT, for unre- 
stricted input. The kernel takes the form of a compression of the instance into a polynomial-sized 
matrix (with bounded entry lengths) , such that the independent sets of columns in the matrix 
reveal whether the instance is positive or negative. By the NP-completeness of OCT, this then 
implies a randomized polynomial kernel. The result is produced by combining the iterative 
compression step of Reed, Smith, and Vetta [55] (in a suitable variant) with the theory of lin- 
early represented matroids, specifically the class called gammoids [53j- We observe that, given 
a graph G and a set of terminals X, a single gammoid can be used to produce a matroid that 
encodes the fiow from 5 to T in G — -R for arbitrary S,T,R C X, and show (using results of 
Marx [IB]) how to produce a matrix representing this matroid, of total size cubic in |X| and log- 
arithmic in |G|, in randomized polynomial time. Having access to this information is sufficient 
to simulate the algorithm of [55j. Here, X is an initial approximate solution to the problem. 
To get this initial set X, we use the 0{\/log n)-approximation of Agarwal et al. [1] to produce 

■^Hiiffner [35] observed that the analysis could be improved to yield 0(3 kmn). 

^ A directed graph is a tournament if for every pair of its vertices, exactly one of the two possible arcs is present. 



an 0(vOPT)-approxiination, giving an 0{k^'^)-size randomized polynomial compression and 
completing the kernelization. This approach of applying matroid theory to kernelization is a 
new tool in the field, and should prove useful for several future kernelization results as well. 
This is also one of very few randomized polynomial kernelizations (we are aware only of Harnik 
and Naor's probabilistic compression for Subset Sum [35]). Our result also implies an 0{k^) 
compression for the Tanglegram Layout problem from computational biology [23] via the 
reduction given in [6]; a polynomial kernel for this problem was left open in [6]. 

Related work. Generally, not much is known yet about excluding polynomial kernels for graph 
modification problems, compared to the wide range of problems that belong to this class. Al- 
though a few kernelization lower-bounds [20l SH [191 [32] and FPT infeasibility results [33] exist, 
for many problems in this class, including ODD CYCLE TRANSVERSAL and Directed Feed- 
back Vertex Set, the question of polynomial kernels is open. Another related set of problems 
is graph separation problems; here as well, there are many FPT problems where the existence 
of polynomial kernels is unknown, such as MULTIWAY CuT [l5l[T7] and Multicut [T2|li7]. 

Kernels for OCT for non-standard parameters are studied by Jansen and Kratsch [52]; a 
polynomial kernel is obtained for the case that one is given an OCT instance {G, k) as well as a 
set X such that G — X is both bipartite and bounded treewidth, with parameter \X\. The paper 
also contains related lower bounds (e.g., if G — X is treewidth 2 but not necessarily bipartite). 

Harnik and Naor [35j raised the question of compression of NP instances with respect to the 
witness size (e.g., size 0{k\ogn) for specifying a subset of size k). As they note, polynomial 
kernelization with the witness size as parameter is equivalent to their notion of deterministic 
compression. See Fortnow and Santhanam [27] for further discussion comparing the approaches 
(and note also that the factor of log n can be absorbed into A; if a problem is FPT with running 
time 2^ n"^). Both Harnik and Naor [3S] and Fortnow and Santhanam [27] also give notions 
of probabilistic compression; the one we use is closer to [27], though we restrict ourselves to 
one-sided error (see Section [2]). However, as there are almost no further examples of randomized 
polynomial kernelization or compression, there will be plenty of time for settling notation later. 
In related work, compressions are instead called bikernels [3] or generalized kernelizations [8]. 

Connections between parameterized complexity and matroid theory have previously been 
studied by Marx [16] , including a self-contained description of representation tools and issues 
for matroids. For more on matroid theory and algorithmic aspects see Oxley [52] as well as 
Schrijver |56j. 

2 Preliminaries 

2.1 Parameterized complexity and kernelization 

We use the following standard notation from parameterized complexity, for more background 
on this area we refer the reader to [2H [25| 151] . A parameterized problem, over alphabet S 
is a language Q C S* x N; the second component of instances (x. A;) G S* x N is called the 
parameter. A kernelization (or kernel) of Q is a polynomial-time computable mapping K : 
E* X N -> S* X N : {x,k) >-^ {x',k') such that {x,k) G Q if and only if {x',k') G Q and 
with |x'|,A;' < h{k) where h is a computable function; h is called the size of the kernel. A 
kernelization is a polynomial kernelization if the size h{k) is polynomially bounded. 

We use the term (parameterized) compression of Q (into Q' ) to denote the relaxed variant 
where K is allowed to map to a different language Q' (also called hikernel [3J or generalized 
kernelization [8]). When Q' is in NP and Q with parameter coded in unary is NP-complete then. 



using the implicit Karp reduction, a polynomial compression of Q into Q' implies a polynomial 
kernelization for Q |llj . 

We define a natural randomized version of kernelization with one-sided error, corresponding 
to the complexity class coRP (variants for RP and BPP could be defined similarly) . Our notion 
of polynomial coRP-compression is essentially equivalent to that of probabilistic compression 
in [27], except for our one-sided error, and that [27] defines the parameter to be given in unary. 

Definition 1 (coRP-kernelization). Let Q C S* x N. A randomized polynomial-time algo- 
rithm K with inputs and outputs in 'S* x N is a randomized kernelization without false nega- 
tives, or coRP-kernelization, for Q if there is a computable function /i : N — )• N such that for 
all {x, /c) G S* X N; 

1. if{x,k) G Q thenprob[K{{x,k)) e Q] = l, 

2. if{x,k) ^ Q thenprob[K{{x,k)) ^Q]>\, and 

3. the size of x' and the value of k' are bounded by h{k), where {x',k') := K{{x,k)). 

The notions of coRP- compression and polynomial coPY' -compression are defined in the natural 
way. 

Note that unlike algorithms for BPP, RP, or coRP we cannot use majority, disjunction, 
or conjunction over the outputs of A^ independent runs to boost the success probability, since 
kernelizations and compressions typically do not solve instances. Harnik and Naor [35j observed 
that a similar effect may be attained by making a combined instance from the result of A^ 
independent runs of the compression (e.g., in our setting, creating an output which is to be 
interpreted as {{xi,ki) G Q) A ... A {{xt,kt) G Q)). Strictly speaking, this approach gives a 
compression, but it can again be turned into a kernelization by the argument via the Karp 
reduction. 

2.2 Matroids 

Matroids are interesting combinatorial structures, generalizing the notion of independence from 
linear algebra, while also drawing from graph theory. There is an extensive theory of matroids, 
as well as several important algorithmic results; see Oxley [52] and Schrijver [56]. 

A matroid is a pair M = {E,I), where E is the ground set and X C 2^ a collection of 
independent sets, such that: (i) G X; (ii) if /i C I2 and I2 G I, then /i G I; and (iii) 
if Ii,l2 G I and I/2I > |/i|, then there exists some x G (/2 \ h) such that /i U {x} G X. A 
set / C £" is independent if / G X, and dependent otherwise. A set i? G X is a basis of M if no 
superset of B is independent; a matroid may equivalently be defined by its set of bases (among 
other variants). Let B be the set of bases of M, and B* = {E \ B : B £ B} . Then B* is the set 
of bases of a matroid M*, called the dual of M. Note that (M*)* = M. 

Let A be a matrix over a field F and E be the set of columns of A. Let X be the set of all 
sets X <^ E of columns that are linearly independent over F (as vectors). Then {E,I) defines 
a matroid Af , and we say that A represents M. A matroid is representable (over a field F) if 
there is a matrix (over F) that represents it. A matroid representable over some field is called 
linear. In this work, we will concern ourselves only with linear matroids. From a representation 
of M, one can easily get a representation of M* over the same field. 

Finally, we define minors of a matroid. For a matroid M = {E,I) and a set T <^ E, deleting T 
results in a matroid M\T = {E\T,I') where X' = {/ G X : / C E\T}. Contracting T results in 
a matroid M/T = {M*\T)*; if T G X, then the independent sets of M/T are the sets X <Z E\T 



such that X U T € X. A minor of a matroid M is any matroid produced from M by deletions 
and contractions. Both operations can be performed with preserved representation. 

3 Polynomial encoding of terminal cuts using gammoids 

The basic situation that is handled in this section is the following. Let D = (V, A) be a directed 
graph, and let X C y be a set of terminals. We want to reduce the graph to a size polynomial 
in \X\ and log \V\, while preserving the size of a minimum vertex cut {S, T) for all sets S,T <Z X. 
Here, a vertex cut is understood as being allowed to delete vertices of S* or T as well as other 
vertices of V; thus the min-cut sizes are bounded by |X|. As an extension, we will also require 
that we may specify any set R (^ X as removed, i.e., we want to have also the cuts {S,T) 
inD-R. 

Clearly, this question is closely connected to the search for polynomial kernels for FPT cut 
problems. However, a direct combinatorial reduction to achieve this, e.g., via edge contractions 
and vertex deletions or other direct simplifications on the graph, seems difficult. It is not even 
clear whether there always exists a graph of the required size, where every minimum {S, r)-cut 
for S,T C X has the same cardinality as in Z) — R. Instead, we here solve the question by 
introducing the use of matroids and matroid representations to the field of kernelization. 

Let us recall a few helpful definitions. For S,T C V, the set T is linked to S if there exist \T\ 
vertex-disjoint paths from S to T, where also the end points of the paths must be disjoint. The 
sets S and T do not need to be disjoint; a vertex is linked to itself by a path of length zero. 
By the cut-flow duality, it is clear that being able to find the linked subsets of X will suffice 
to answer all questions about cuts {S,T) in D. Perfect [53] showed, given any D = {V,A) 
and 5 C y, that the subsets of T which are linked to S in D form a matroid {D, S), of a class 
now called gammoids (see [521 156]). Marx [l6] gave a randomized polynomial-time procedure 
for finding a representation of this matroid. 

Theorem 1 ([531 US])- Let D = iV,A) be a directed graph, and let S <^V. The subsets T <ZV 
which are linked to S form the independent sets of a matroid over V . Furthermore, a represen- 
tation of this matroid can be obtained in randomized polynomial time with one-sided error. 

Here, one-sided error means that dependent (non-linked) sets are preserved, but independent 
(linked) sets in the graph may not be, i.e., if the procedure returns a matrix A, then there 
may be some subsets of T which are linked to S but which are not independent in the matroid 
represented by A. However, the risk of this can be made arbitrarily small. 

It remains for us only to bound the bit-length of the entries of the matrix (which would 
otherwise be polynomial in |V^|). This is easily done by standard methods. 

Corollary 1. Let D = {y,A) be a directed graph, e > a given real, and let S and T be pos- 
sibly overlapping subsets of V . Let M be the gammoid formed by subsets of T linked to S. 
A representation of M as an \S\ x |r| matrix over the rationals with entries of bit-length 
©(mindTl, |5| log |T|) + log(l/e) + log|y|) can be computed in randomized polynomial time 
with one-sided error at most e. 

Proof. Theorem [1] can be made to return an \S\ x \V\ matrix over the rationals, with arbitrarily 
small one-sided error e' > and individual entries being integers of bit-length polynomial in \V\ 
and log 1/e'. Let e' = e/2, and let A be the matrix returned, with columns not in T removed. 
To reduce the length of the entries, we take all entries in A modulo a sufficiently large random 
prime p. We argue that the matrix A' produced this way satisfies all conditions. 



Consider an independent column set in A. Since independence corresponds to a square 
submatrix with non-zero determinant, we see that A and A' differ in this aspect only if p 
divides said determinant. The number of distinct prime factors in a number is bounded by 
the bit-length, which for our determinants is polynomially bounded in \V\ + log 1/e. Since the 
number of maximal independent sets in M is bounded by both |r|l'^l and by 2''^', the total 
number of distinct primes is bounded as t = min(|T|i ',2' l)(|l/| +logl/e) '•^^ Thus, if we pick 
a random prime from a set of size at least t' = (2/e) • t, the total risk of failure is bounded by e. 
By the Prime Number Theorem, primes of bit length 0(log(i'logi')) = 0{logt + log 1/e) are 
sufficient for this, which matches the statement of the corollary. By the AKS primality testing 
algorithm [2], finding a random prime can be done with high probability by repeated uniform 
sampling, and if this fails we may pick an arbitrary fixed prime. It can be seen that errors 
throughout are one-sided. D 

The following proposition extends the available gammoid structure to allow any subset S 
of the terminals as sources (i.e., without fixing it in advance), and to support also deletion of 
terminals. This will be our interface for using Corollary [1] in the Odd Cycle Transversal 
kernelization. The argument is straightforward and works as well for a given directed graph. 

Proposition 1. Let G = {V,E) be an undirected graph, let X (^ V be a set of terminals, 
and let X' := {v' \ v G X} be a set of new vertices. There is a polynomial-time construction 
of a directed graph D = (V U X',A) such that I C X U X' is an independent set of the 
gammoid {D,X') if and only if T is linked to S in G — R where 

• S contains all vertices v ^ X with v, v' ^ /, 

• T contains all vertices v G X with v,v' £ I, and 

• R contains all vertices v G X with v G I but v' ^ /. 

Proof. The arc set A of the digraph D is defined as follows: For any two adjacent vertices u,v £ 
V add {u, v) and {v, u) to A. Then add an arc {v\ v) for all v € X (and corresponding v' G X'). 

We consider first any independent set / C X ^ X' oi the gammoid {D,X'). There are |/| 
vertex-disjoint directed paths from X' to / in D; fix any such packing V of directed paths. By 
the structure of D all paths of V are either of form [u') with u' G X' , or of form [u' , u, . . . ,v) 
(possibly with u = v) and containing no vertices of X' \ {u'}. Let P = (u' ,u, . . . ,v) £ V 
with u' S X' and v £ T, i.e., with v, v' € /. Clearly, u' ^ v' since {v') G "P is the unique directed 
path from X' to v' and must be contained vaV as v' £ I (and using vertex-disjointness). Since u 
and u' are on P, no other path of V can end in u or u' , thus u,u' ^ I and hence u £ S. Finally, 
no vertex p £ R can be on V since, hy p £ I, that requires another path of V to end in p. Now, 
the subpath {u, . . . ,v) contained in D — X' corresponds to an undirected path from 5 to T 
in G — R, and all those paths are vertex-disjoint. 

Now, let P be a set of |r| vertex-disjoint paths from S* to T in G — i?. We construct a set V 
of directed vertex-disjoints paths from X' to / in the digraph D where / is obtained according 
to the statement of the proposition. For each path (n, . . . , v) £ V we add the path {u', u, . . . ,v) 
to V' . Clearly, those paths exist in D and they are vertex-disjoint. We also require paths ending 
in the vertices oi I\T. This includes vertices r £ R and vertices v' with v ^ SUR. It is easy to 
see that adding paths {r',r) and {v'), respectively, for all those vertices yields the required path 
packing V' (key fact: in the initial \T\ paths we only used v' when v £ S, and vertices r £ R 
were unused). Thus / is an independent set of the gammoid {D,X'). D 



Since it appears a very useful form, e.g., for obtaining polynomial kernels for other cut 
problems, we explicitly state the combination of Proposition [1] and Corollary [1] as a corollary. 

Corollary 2. Let G = {V, E) he an undirected graph, X <^V a set of terminals, and e a positive 
real. There is a randomized polynomial-time algorithm computing an \X\ x 2\X\ matrix with 
integer entries of hit length 0{\X\+\og |y|+log 1/e), such that with prohahility at least (1— e) any 
set of columns / C X U X' is independent in M ifT is linked to S in G — R, where S,T,R <^ X 
are defined as in PropositionUl The error is one-sided: the number of disjoint paths as indicated 
hy independence is a lower bound on the true value in G. 

4 A randomized polynomial kernel for odd cycle transversal 

In this section, using the results presented in the previous section, we will give our randomized 
polynomial kernelization for Odd Cycle Transversal. We will start by a presentation of 
the FPT algorithm of Reed, Smith, and Vetta [55j, as understanding this algorithm is critical 
to understanding the kernelization. This is presented in Section 14.11 Then we present the 
kernelization in Section 14.21 and finally discuss the relation with lower bounds in Section 14.31 

4.1 The Reed-Smith-Vetta algorithm 

The FPT algorithm of Reed, Smith, and Vetta [55] solves Odd Cycle Transversal by a 
recursive approach: Solve the problem for G — v, where v is an arbitrary vertex. If it returns 
a solution X„ of size at most k, then X := X^j U {v} is a solution of size at most k -\- 1 for G, 
and the following compression version of the problem is solved. Otherwise (G — v,k) is NO, 
thus (G, k) must be NO. 

Input: A graph G = {V, E), an integer k, and a bipartization set X of size A; + 1. 

Parameter: k. 

Question: Is there a bipartization set Y for G such that \Y\ < k? 

The compression routine in the Reed-Smith-Vetta algorithm consists of trying exhaustively 
all ways of how the set X could interact with a smaller solution Y, each coming down to a 
maximum flow computation. Concretely, we create a graph G" = {V, E') from G and X in the 
following way: let 5i U S2 be a bipartition oi G — X. Let V = V — X U {j;i, 2:2 : x G X}, 
where xi and X2 are new vertices. Connect xi to all neighbors of x in 5*2 and X2 to all neighbors 
of X in Si. By subdividing edges, we may assume that there are no edges inside X. Note 
that G' is bipartite with partitions Si U {xi : x G X} and 5*2 U {x2 ■ x £ X}. For U C X, 
let X'{U) := {xi,X2 :xeU}, and let X' = X'{X). 

The algorithm searches for cuts through X' in G'. For a subset C/ C X, let a pair (5, T) 
of disjoint subsets of X' be a valid split of U if for every x G U we have |{xi,X2} n 5*1 = 
|{xi,a;2}nr| = 1 and for every x e {X\U) we have |{xi,X2} n S| = |{a;i,X2}nT| = 0. The 
following lemma is a direct consequence of |55j . 



Lemma 1 ([55]). Let G = {V, E) he a graph and let X CV such that G — X is bipartite. Let G' 
he constructed from G and X as above. Let 6{H, S, T) denote the minimum size of an (S, T) 
vertex cut in H . The minimum size ofYCV such that G — Y is bipartite equals the minimum 
of \X\U\+ 6{G' - X'{X \U),S, T) over all subsets U of X and all valid splits (5, T) of U. 

Clearly, given this result, one can find an optimal bipartization set for G by looping over 
the 3'^' options for U, S, and T. In particular, one is not limited to using X of size A; + 1 but. 



sacrificing runtime, a single run of the iterative compression routine on G and an approximate 
solution X for G suffices. In the next subsection, this setting will be used to give a polynomial 
kernelization of Odd Cycle Transversal. 

For proofs of Lemma [H see [55] or one of the presentations subsequently given by other 
authors [371 I44j . For variation, we will now sketch an alternative approach to showing the 
result. 

Instead of a graph, we view the OCT instance as a 2-SAT formula F containing only con- 
straints (x = y) and {x ^ y). Clearly, for any graph G, if we replace every edge {u,v} in G 
by a constraint (u ^ v), then we get a formula F over V which is satisfiable if and only if G is 
bipartite, and this holds for every induced subgraph of G as well. Thus, the problem reduces to 
deleting k variables Z oi F and their incident constraints such that the remaining formula F — Z 
is satisfiable. 

Now, observe that we can negate a variable v in F by changing (n = t>)-constraints to {u ^ v)- 
constraints and vice versa. This does not affect the satisfiability oi F — Y for any variable 
set Y . Thus, negate variables in F so that F — X \s satisfied by the all- zero assignment. We 
now observe that the only remaining disequality constraints (x 7^ y) of -F are incident to X. 
By deleting or assigning every x S X, we create smaller formulas F' containing only equality 
constraints {u = v) and assignments [u = 0) or (n = 1). This trivially reduces to a vertex cut 
problem in a graph, which would conclude the proof of the FPT result. 

To get from here to the construction of G' above, consider the effects of splitting variables x G 
X m. F into distinct variables x and -ix representing its literals, and replace constraints (x 7^ y) 
by (-ix = y). Clearly, in any assignment we must require (x 7^ -ix). Applying all this to the 
bipartition Si^J S2 oi G — X will show the equivalence of the result. 

4.2 Kernelization 

Now, we give the kernelization. We begin by describing a compression procedure for OCT, 
by applying the matroid tools of Section [3] to Lemma [1] above. The result is a randomized 
polynomial-time compression procedure with one-sided error, consisting of the following steps. 
Let an instance (G, k) of OCT and an error parameter e > be given. 

1. lik < logn, run the Reed-Smith- Vetta algorithm in time O^S'^kmn) = n^^^> (polynomial 
in n) and return a constant-size YES- or NO-instance accordingly. 

2. Otherwise, if A; > logn, let X be an approximate solution of ratio 0{\/logn) = 0{k^''^), 
provided by an algorithm due to Agarwal et al. [Ij. Unless \X\ = O^k^'"^), answer NO as 
there cannot be a solution of size at most k. (li \X\ < k then answer YES.) 

3. Create the auxiliary graph G' from G and X, as in Section [4. II Let X' = {xi, X2 | x G X}. 

4. Apply Corollary[2]to G' with terminal set X' and error parameter e/2, creating a matrix A. 

5. Output {A, k) as a polynomial-sized compression. 

The total coding size of the matrix A is cubic in \X\ (up to factors logarithmic in 1/e). By the 
size guarantee in Step [21 we get an 0(A:^'^)-sized compression for OCT. Note that {A,k) may 
be interpreted as an instance of an (artificial) decision problem (implicitly defined in the proof 
of Lemma [2]). Clearly, this problem is in NP, allowing us to reduce back to OCT to complete 
the kernelization (using the implicit Karp reduction as discussed in Section 12. ip . 

For Edge Bipartization, we may replace Steps [1] and [2] by the C'(logOPT)-approximation 
of Avidor and Langberg [4J, followed by an easy reduction from Edge Bipartization to 



OCT, for a compression of size (D{k^). It is an interesting question whether a polylog(OPT)- 
approximation is possible for OCT, as this would give us a 0(A;^)-sized compression for OCT. 
Now we give our central compression result. 

Lemma 2. Let {G, k) be an instance of OCT, X a bipartization set for G, and e > be given. 
Then there is a randomized compression of{G,k) to size 0{\X\'^{\X\ +logl/e)) with one-sided 
error, producing no false negatives. The error probability is bounded by e, and the running time 
is polynomial in \G\ and logl/e. 

Proof. The algorithm proceeds as Steps [3][5] of the kernelization algorithm, creating first an 
auxiliary graph G' with a terminal set X' of size 2\X\, and then invoking Corollary [2] on G' , X' , 
and e/2. Let the resulting matrix be A; our compression output is then {A,k). The running 
time and output size are given by Corollary [SJ we only have to argue that {A, k) contains 
all the information needed to decide the status of the OCT instance {G,k). By Step [21 we 
assume \X\ > k. 

Recall the definition X'{U) = {2;i,X2 : x S U} for [/ C X. By Lemma [T| we need for 
all U C X the minimum {S,T) vertex cut size in G' — X'{X \ U), where S and T range over 
all valid splits of U. Clearly, if the minimum {S,T) vertex cut size in G' — i? is A then there is 
a set T' C T of size A such that T' is linked to S in G' — R, i.e., such that there are A = \T'\ 
vertex disjoint paths from S to T' in G' — R (using cut-fiow duality). This can be obtained 
from the matrix A by testing independence of all sets / which correspond (as in Corollary [2]) to 
choices S, T', R with T' (^T and R = X'{X \ U). Note that whether an (S, T)-cut may delete S 
and T makes no difference for the algorithm using Lemma [TJ 

The behavior and one-sidedness of the error also follows from Corollary [21 D 

With the previously described algorithm and the approximation results we get the following. 

Theorem 2. There is a randomized 0{k'^'^)- compression for Odd Cycle Transversal and 
a randomized 0{k^)- compression for Edge Bipartization, with one-sided error with no false 
negatives and failure probability exponentially small in k. 

The target problems of the compressions are constrained minimization problems over the 
rank of a matroid. We omit the precise definitions, but it should be clear that the problems 
can be made well-defined, and that they are in NP. As discussed in Section 12.11 using NP- 
completeness of Odd Cycle Transversal and Edge Bipartization, we get the following 
polynomial coRP-kernelization results. 

Corollary 3. Odd Cycle Transversal, Edge Bipartization, Balanced Subgraph, 
and Tanglegram Layout (see I6j) have polynomial coJlP -kernelizations. 

4.3 A Note on Lower Bounds 

Finally, we remark that although kernelizations and kernelization lower bounds are usually 
expressed in terms of deterministic results, the type of randomized kernelizations we produce 
here (i.e., one-sided error with no false negatives) does in fact fit within the lower bounds 
framework of Bodlaender et al. ^ and Fortnow and Santhanam [27] , as the proofs implicitly 
exclude also co-nondeterministic kernels. Since our coRP kernelization is a special case of this, 
our tools cannot be used to escape the lower bounds. Dell and van Melkebeek [19] noted the 
connection to co-nondeterministic kernels, and brought it further by giving lower bounds in 
terms of the amount of communication needed to solve a problem in an oracle setting, where a 
polynomially bounded but co-nondeterministic player communicates with an computationally 
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unbounded oracleo They provide concrete lower bounds for various problems, among others 
implying the following. 

Theorem 3 ([19]). Let e > 0. No co-nondeterministic kernel or compression for OCT can 
achieve a total size of 0{k'^~'') unless NP C coNP/poZy and the polynomial hierarchy collapses. 

It seems difficult to go below an 0{k^) bound using our methods. Thus, we leave it as an open 
question whether the upper or the lower bound on the total compression size can be improved 
for Odd Cycle Transversal and Edge Bipartization. 

5 Conclusion 

We have presented randomized polynomial kernelizations for Odd Cycle Transversal and 
Edce Bipartization. The key contribution is the introduction of matroids into kerneliza- 
tion, by encoding the compression step of the Reed-Smith- Vetta algorithm for Odd Cycle 
Transversal [55], by means of a matroid. This leads to a compression of the problem into 
size 0{k'^-^) for Odd Cycle Transversal and 0{k^) for Edge Bipartization, which is eas- 
ily turned into a kernelization by back-reductions to the original problems. The kernelization 
has one-sided error, producing no false negatives, and the failure rate can be made exponen- 
tially small in k at only a constant factor cost to the size. While this essentially settles the 
question about existence of polynomial kernels, the more practical result seems to be the output 
of the compression. Not only is compression to any set the more robust notion (cf. [19J); the 
target problem is native in one of the most well-studied areas of mathematics and computer 
science. The compression may also point the way for where to look for direct, combinatorial 
kernelizations for the problem. 

It is interesting that the results of Fortnow and Santhanam [27] can be seen to exclude also 
co-nondeterministic compressions [19]. Thus, our technique is not a way of avoiding the lower 
bounds given by [8l[27], but a way of settling problems for which neither such lower bounds nor 
a polynomial kernelization or compression are known. 

We close with some open problems. It is still interesting whether there exist deterministic 
polynomial kernehzations for Odd Cycle Transversal and Edge Bipartization, either as 
a derandomization of our methods, or (which would have independent interest) as a properly 
combinatorial kernelization. Additionally, for both problems, there is the question of the correct 
size bound. We note that a C'(polylog(OPT))-approximation for OCT, which is consistent with 
approximation theory lower bounds, would improve our result to 0{k'^), but this still leaves a 
gap to the 0{k'^~'') lower bound given by Dell and van Melkebeek [19]. Finally, existence of 
(randomized) polynomial kernels is an exciting question for several related problems, including 
Directed Feedback Vertex Set, Multiway Cut, and Edge and Vertex Multicut. 
The robust way in which the introduction of matroids into kernelization helped to settle the 
question for Odd Cycle Transversal gives reason to believe that it will play a key role for 
some of these problems as well. 
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